Effects of Distance between Classes and Training Datasets Size to the Performance of XCS: Case of Imbalance Datasets

نویسندگان

  • Thach H. Nguyen
  • Sombut Foitong
  • Sornchai Udomthanapong
  • Ouen Pinngern
چکیده

This paper analyzes the effects of distance between classes and training datasets size to XCS classifier system on imbalanced datasets. Our purpose is to answer the question whether the loss of performance incurred by the classifier faced with class imbalance problems stems from the class imbalance per se or it can be explained in some other ways. The experiments from 250 artificial imbalanced datasets show that XCS can perform well in some imbalance domains if the training datasets size is large enough and the distance between classes is appropriate. Thus, it dose not seem fair to correlate imbalance datasets directly to the loss performance of XCS. Through this research, we also know what kinds of datasets are suitable for training XCS and dealing with class imbalances alone will not always help improve performance of classifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matching of Polygon Objects by Optimizing Geometric Criteria

Despite the semantic criteria, geometric criteria have different performances on polygon feature matching in different vector datasets. By using these criteria for measuring the similarity of two polygons in all matchings, the same results would not have been obtained. To achieve the best matching results, the determination of optimal geometric criteria for each dataset is considered necessary....

متن کامل

Role of Heuristic Methods with variable Lengths In ANFIS Networks Optimum Design and Training

ANFIS systems have been much considered due to their acceptable performance in terms of creation of fuzzy classifier and training. One main challenge in designing an ANFIS system is to achieve an efficient method with high accuracy and appropriate interpreting capability. Undoubtedly, type and location of membership functions and the way an ANFIS network is trained are of considerable effect on...

متن کامل

ارائه یک روش فازی-تکاملی برای تشخیص خطاهای نرم‌افزار

Software defects detection is one of the most important challenges of software development and it is the most prohibitive process in software development. The early detection of fault-prone modules helps software project managers to allocate the limited cost, time, and effort of developers for testing the defect-prone modules more intensively.  In this paper, according to the importance of soft...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007